The overarching goal of CoRR is to create an open science resource for the imaging community that will facilitate the assessment of test-retest reliability and reproducibility for functional and structural connectomics. In order to accomplish this, we will aggregate resting state fMRI (R-fMRI) and diffusion imaging data across laboratories around the world, and share the data via the International Neuroimaging Data-sharing Initiative (INDI) to enable the:
Contributors: Any laboratory willing to openly share multimodal imaging datasets (including an R-fMRI scan and a corresponding anatomical image at a minimum) with at least one retest occasion. Institutional IRB/ethical committee approval or waiver (see below) is required prior to contribution of data. MRI data: Our primary focus is on R-fMRI data, with a secondary focus on diffusion imaging data. While we encourage sharing data with minimal movement, we are placing no exclusion criteria for motion. This decision was based on the realizations that: 1) there is no consensus on acceptable criteria for movement in functional MRI or diffusion imaging data, 2) high motion datasets are essential to the determination of the impact of motion on reliability, and 3) new approaches continue to be developed to account for movement artifacts. We also encourage submission of data from other modalities (e.g., ASL) or experimental paradigms (e.g. task data) when available for the same participants for whom R-fMRI data are being provided.
Phenotypic data: Given that this is a retrospective data collection, we will focus on basic phenotypic measures that are relatively standard in the neuroimaging field, as well as fundamental for analyses and sample characterization. Our phenotypic key is organized to reflect three classifications of variables: 1) core (i.e., minimal variables required to characterize any dataset), 2) preferred (i.e., variables that are strongly suggested for inclusion due to their relative import and/or likelihood of being collected by most sites), and 3) optional (variables that are data-set specific or only shared by a few sites).
Aggregation pipeline: CoRR data aggregation will be carried out through an INDI portal located in the COINS database (http://coins.mrn.org). Application instructions will be provided along with the instructions about defacing, eliminating protected health information (PHI), and uploading. Data will be checked after upload for quality, formatting, and the presence of any PHI before final download to the site.
IRB: Contributed data must have been collected through a research study approved by the applicable institutional review board (IRB) or ethics committee. Unrestricted public sharing of data via CoRR must be approved or exempted by the contributing site’s IRB/ethics committee. IMPORTANTLY, consistent with the established policy of the 1000 Functional Connectomes Project, all data shared via the CoRR effort are de-identified per HIPAA guidelines (i.e., removal of all 18 protected health information identifiers), including the removal of any face information from the image. Each contributor is responsible for confirming with their local ethics committee or IRB whether IRB approval or exemption is required – in the case that such IRB documentation is required, we ask that the final documentation be provided before the time of contribution. Please find below sample text that can be used for your IRB or ethics committee:
“The CoRR Project data sharing effort will provide the research community with open access to datasets contributed by labs from throughout the world. The CoRR repository will be entirely anonymized (i.e., any personal identifying information will be removed from header/support files). All datasets will be de-identified, including the removal of any face information from the images, in compliance with HIPAA protocols by the contributor, prior to transferring the data to CoRR staff. Additionally, prior to transfer to the FCP/INDI, each individual participant’s dataset is assigned a randomized seven-digit participant identifier. Upon receipt, datasets are automatically organized and header files are replaced with anonymized header files to further guarantee that any remaining identifying personal information within the header or supporting files has been removed. All phenotypic variables are reviewed prior to distribution to ensure that no personally identifying information or variables are included. For contributions that include clinical populations, we will also include categorical diagnostic labels and severity measures (when available). Any additional physiological monitoring data provided by contributing sites will be made available for public release, as long as it in no way compromises privacy or reveals protected health information. This information will serve to facilitate more careful characterization of the sample, without entailing risk of violation of confidentiality. Datasets are only included in the repository after investigators provide expressed, written permission for usage of the dataset freely by the general public, without limitation.”